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Abstract 

Background: The metabolic capacity for nitrogen fixation is known to be present in several prokaryotic species 
scattered across taxonomic groups. Experimental detection of nitrogen fixation in microbes requires species-specific 
conditions, making it difficult to obtain a comprehensive census of this trait. The recent and rapid increase in the 
availability of microbial genome sequences affords novel opportunities to re-examine the occurrence and 
distribution of nitrogen fixation genes. The current practice for computational prediction of nitrogen fixation is to 
use the presence of the nifH and/or nifD genes. 

Results: Based on a careful comparison of the repertoire of nitrogen fixation genes in known diazotroph species we 
propose a new criterion for computational prediction of nitrogen fixation: the presence of a minimum set of six 
genes coding for structural and biosynthetic components, namely NifHDK and NifENB. Using this criterion, we 
conducted a comprehensive search in fully sequenced genomes and identified 149 diazotrophic species, including 
82 known diazotrophs and 67 species not known to fix nitrogen. The taxonomic distribution of nitrogen fixation in 
Archaea was limited to the Euryarchaeota phylum; within the Bacteria domain we predict that nitrogen fixation 
occurs in 13 different phyla. Of these, seven phyla had not hitherto been known to contain species capable of 
nitrogen fixation. Our analyses also identified protein sequences that are similar to nitrogenase in organisms that do 
not meet the minimum-gene-set criteria. The existence of nitrogenase-like proteins lacking conserved co-factor 
ligands in both diazotrophs and non-diazotrophs suggests their potential for performing other, as yet unidentified, 
metabolic functions. 

Conclusions: Our predictions expand the known phylogenetic diversity of nitrogen fixation, and suggest that this 
trait may be much more common in nature than it is currently thought. The diverse phylogenetic distribution of 
nitrogenase-like proteins indicates potential new roles for anciently duplicated and divergent members of this 
group of enzymes. 




Genomics 



Background 

Biological nitrogen fixation is the major route for the 
conversion of atmospheric nitrogen gas (N 2 ) to ammonia 
[1]. However, this process is thought be limited to a 
small subset of prokaryotes named diazotrophs, which 
have been identified in diverse taxonomic groups [2]. 
This biochemical pathway is only manifested when species- 
specific metabolic and environmental conditions are met, 
thus making it difficult to develop a standard screen for 
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detection of this biological reaction [3,4]. The complications 
in experimentally detecting nitrogen fixation may be a rea- 
son for the relatively low number and relatively sparse dis- 
tribution of known diazotrophic species. 

All known diazotrophs contain at least one of the three 
closely related sub-types of nitrogenase: Nif, Vnf, and 
Anf. Despite differences in their metal content, these 
nitrogenase sub-types are structurally, mechanistically, 
and phylogenetically related. Their catalytic components 
include two distinct proteins: dinitrogenase (comprising 
the D and K component proteins) and dinitrogenase re- 
ductase (the H protein) [1,2]. The only known exception 
to this rule is the superoxide-dependent nitrogenase 
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from Streptomyces thermoautotrophicus, whose protein 
sequence is unknown [5]. 

The best studied sub-type is the molybdenum- 
dependent (Mo-dependent) nitrogenase, the structural 
components of which are encoded by nifH, nifD, and 
nifK [1]. The two other sub-types of nitrogenase, known 
as alternative nitrogenases, are enzyme homologs with 
the exception of an additional subunit (G) in the dinitro- 
genase component and the absence of the heteroatom 
Mo. The vanadium-dependent nitrogenases are encoded 
by vnfH, vnfD, vnfG, and vnfK. The members of the third 
sub-type, the iron-only nitrogenases, are devoid of Mo 
and V, and their components are products of anfli, anfD, 
anfG, and anfK. High levels of protein sequence identity 
among analogous subunits across the nitrogenase sub- 
types allow investigation of the biodiversity in nitrogen 
fixation using NifH (similar to VnfH and AnfH) and/ or 
NifD (similar to VnfD and AnfD) as markers. Most 
phylogenetic studies of nitrogen fixing organisms have 
used only NifH and/or NifD sequences as queries to as- 
sess diversity [4,6-8]. 

The high level of complexity of nitrogenase metal- 
loclusters results in a laborious pathway for the assembly 
and insertion of the active site metal-cofactor, FeMoco, 
into dinitrogenase. Apart from the catalytic components, 
additional gene products are required to produce a fully 
functional enzyme [9]. Although the number of proteins 
involved in the activation of nitrogenase seems to be spe- 
cies-specific and varies according to the physiology of 
the organism and environmental niche [10,11], so far 
over a dozen genes have been identified as being 
involved in this process. Despite variations in the precise 
inventory of proteins required for nitrogen fixation, it is 
well acknowledged that the separate expression of the 
catalytic components is not enough to sustain nitrogen 
fixation, thus indicating that the FeMoco biosynthetic 
enzymes play a crucial role in dinitrogenase activation 
[12]. 

In the last few years, substantial advances have been 
made in the functional assignment of individual gene 
products involved in the biosynthesis of FeMoco in 
Azotobacter vinelandii [9,12,13]. The current biosyn- 
thetic scheme involves a consortium of proteins that 
assembles the individual components, iron and sulfur, 
into Fe-S cluster modules for subsequent transformation 
into precursors of higher nuclearity, and addition of the 
heteroatom (Mo) and organic component (homocitrate). 
The synthesis of FeMoco is completed in a so-called 
scaffold protein, NifEN, and shuttled to the final target 
by cluster carrier proteins. Interestingly, the scaffold 
NifEN has amino acid sequence similarity to NifDK [14]. 

The recent growth of genomic databases now includ- 
ing nearly 2,000 completed microbial genomes motivated 
us to re-evaluate the diversity of species capable of 



nitrogen fixation. Identification of co-occurrence of ni- 
trogen fixing genes in species known to fix nitrogen 
enabled us to identify novel potential diazotrophs based 
on their genetic makeup. Our findings expand the 
expected occurrence of nitrogen fixation and the bio- 
diversity of diazotrophs. In addition we have identified a 
large number of phylogenetically diverse nitrogenase- 
proteins that may represent ancestral forms of the en- 
zyme and may have evolved to perform other metabolic 
functions. 

Results 

Species containing NifD and NifH-like sequences 

The rapid expansion of microbial genome sequencing 
in the last few years affords novel opportunities to re- 
examine the distribution of nitrogen fixation genes. In 
this work, we have searched the genome sequences of 
fully sequenced microbe genomes available in Gen- 
Bank [15] for coding sequences similar to NifD and 
NifH. The initial search included 1002 Archaeal and 
Bacterial distinct species with fully sequenced gen- 
omes, 174 of which contained sequences similar to 
NifH as well as sequences similar to NifD. Literature 
searches on these species indicated that nitrogen fix- 
ation has not been experimentally demonstrated in 
more than half of these (92 out of 174), thus suggest- 
ing that the phylogenetic distribution of diazotrophs is 
wider than currently known. Based on the literature 
survey (Additional file 1: Table SI), we classified spe- 
cies with hits into two categories: (1) known diazo- 
trophs - with experimental demonstration, and (2) 
potential diazotrophs - with no reports of experimen- 
tal demonstration. Interestingly, during this literature 
search we found three recent reports providing ex- 
perimental demonstration of diazotrophy motivated by 
an initial genomic identification of putative nitrogen 
fixation genes [16-18]. 

Identification of a minimum gene set 

The crucial involvement of the FeMoco biosynthesis 
enzymes prompted us to analyze the occurrence of 
nine additional nif genes in known diazotrophic spe- 
cies encoding NifK, NifE, NifN, NifB, VnfG, NifQ, 
NifV, NifS, and NifU. The involvement of eight of 
these proteins in FeMo-cofactor synthesis and nitro- 
genase maturation has been determined [3,9,12]. The 
co-occurrence of additional nif genes varied from spe- 
cies to species [19,20]. These differences in genetic 
requirements most probably reflect variations in meet- 
ing the physiological demands associated with nitro- 
gen fixation and in species-specific metabolic and 
environmental life styles. Nevertheless, the identifica- 
tion of relevant hits (listed in the Additional file 2: 
Table S2) revealed that nearly all known diazotrophs 
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Figure 1 Genes involved in nitrogen fixation. Top- A vinelandii nif gene regions. Gray-shaded trapezoids are essential genes in Mo-dependent 
nitrogen fixation that were used as queries for the in silico identification of nitrogen fixing species described in this study. Bottom -The proposed 
minimum set of genes required for nitrogen fixation. All species with sequenced genomes that are known diazotrophs and all the species 
proposed to be diazotrophs based on genetic content contain the minimum gene set. 



contain a minimum of six conserved genes: nifH, nifD, 
nifK, nifE, nifH, and nifB (Figure 1). The co-occurrence 
of these six nif genes, known to be essential for nitrogen fix- 
ation in characterized systems, has led us to propose a re- 
quirement for a minimum gene set for nitrogen fixation that 
can be used as an in silico search tool for the identification 
of additional diazotrophs. We did find a few exceptions to 
this minimum gene set rule, and they are discussed below. 

Our investigation showed that a clustered genomic ar- 
rangement of nif genes was a recurring feature in known 
diazotrophic genomes. In several species the minimum 
gene set was located in a single genomic region. In all 
cases, at least three out of the six genes contained in the 
minimum set were in contiguous gene regions. Most 
often, nifHDK were clustered, but in some other cases, 
nifDK was adjacent to nifEN. Nevertheless, the genomic 
synteny of nif genes across nitrogen-fixing species facili- 
tated in silico assignments of putative sequences involved 
in nitrogen fixation. 

Identification of new diazotrophs 

We identified potential diazotrophic species by computa- 
tional searches using the minimum gene set (Additional file 
2: Table S3). We identified 92 species containing coding 
sequences similar to NifD and NifH, 67 of which met the 
minimum gene set criteria (i.e. their genome contained at 
least nifH, nifD, nifK, nifE, nifH, and nifB). Based on gene 
content, we propose that these 67 species have the capacity 
for nitrogen fixation. 

Biodiversity of nitrogen fixing species 

The taxonomic distribution of diazotrophs identified 
through computational assignment suggests that nitrogen 



fixation has greater biodiversity. Prior to this work, 
known bacterial diazotrophs were found in six taxonomic 
phyla: Actinobacteria, Chlorobi, Chloroflexi, Cyanobac- 
teria, Firmicutes and Proteobacteria (Figure 2 - gray bars). 
Our study resulted in the identification of potential diazo- 
trophs within the already identified phyla and added seven 
new phyla (Figure 2 - black bars). Thus, despite the avail- 
ability of few representatives in these other seven phyla 
(Figure 2), applying the minimum gene set criteria has 
expanded the biodiversity of this metabolic trait by ap- 
proximately two-fold. No potential diazotrophs were 
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Figure 2 Taxonomic diversity of nitrogen fixing species. Species 
with fully sequenced genomes (999 Bacteria and 93 Archaea 
genomes) were analyzed for the minimum set of nitrogen fixation 
ortholog genes. Taxonomic distribution of diazotrophic species 
based on experimental evidence (gray bars) and in silico prediction 
of nitrogen fixation (black bars) is displayed by phylum. The ratio of 
the number of proposed species versus the number of total distinct 
species with sequenced genomes within each phylum is indicated. 
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identified in Acidobacteria (5), Deinococcus-Thermus 
(13), Dictyoglomi (2), Elusimicrobia (1), Fibrobacteres (1), 
Gemmatimonadetes (1), Planctomycetes (5), Synergistetes 
(2), Tenericutes (29), Thermotogae (12), Thermodesulfo- 
bacteria (3), and Thermomicrobia (1) (in parenthesis, the 
number of species in each group with fully sequenced gen- 
omes). The lack of diazotrophs within these phyla could 
be attributed to the under-representation of sequenced 
genomes in these taxonomic groups. Unlike bacterial spe- 
cies, nitrogen fixation in Archaea is contained only within 
the phylum Euryarchaeota, where we identified seven spe- 
cies as potential diazotrophs. 

Sporadic occurrence of alternative nitrogenase 

The presence of an additional subunit, AnfG or VnfG 
(Additional file 2: Table S2, Additional file 2: Table S3) 
and distinct sequence features of alternative nitrogenases 
allowed us to distinguish the Mo-dependent enzymes 
from the alternative systems [3,21]. The genomes of 
most diazotrophs encode only one copy of the Mo- 
dependent sub-type of nitrogenase (134 out of 149 
species). Exceptions were species containing additional 
sub-types (Vnf and/or Anf), such as the well-studied 
A. vinelandii and Rhodopseudomonas palustris, as well 
as Dickeya dadantii, Chloroherpeton thalassium, Metha- 
nobacterium sp., Paludibacter propionicigenes, Rhodomi- 
crobium vannielii, and Syntrophobotulus glycolicus. 
Unexpectedly, selected Alphaproteobacteria species, in- 
cluding Rhizobium etii and Sinorizobium fredii, encoded 
two putative copies of Mo-dependent nitrogenase, where 
one copy of nifHDK is clustered with nifEN and the other 
copy only has genes similar to the catalytic components 
nifHDK. As previously proposed [10], alternative nitro- 
genases were only found in species containing genes cod- 
ing for the Mo-dependent enzyme. This finding suggests 
that the hierarchy of expression of Mo-dependent over al- 
ternative nitrogenase, observed in A. vinelandii, may be 
universal to all species containing alternative nitrogenases 
[10]. 

Phylogenetically distinct NifDK enzymes are present in 
thermophilic strains lacking a defined FeMoco 
biosynthesis pathway 

Our analysis of nif gene content revealed 28 strains 
that did not meet the minimal gene set criteria be- 
cause they lacked either NifN or both NifE and NifN. 
Nevertheless, some of the hyperthermophilic represen- 
tatives of this class, for example, the deep-sea vent 
archaeon Methanocaldococcus sp. FS406-22, have been 
demonstrated to fix nitrogen [22]. To further analyse 
the properties of the putative nitrogenases encoded by 
this class, we examined the environment of the 
FeMoco ligands in 15 NifD proteins, which we refer 
to collectively as group C. NifDK homologs belonging 



to this group possess the conserved Cys residues 
required for liganding a P cluster, and the NifD com- 
ponent contains the FeMoco ligands ccCys275 and 
aHis442. The NifD subunits also contain the equiva- 
lents of aGlnl91 and aHisl95 that are important for 
nitrogen reduction, and in addition, the homocitrate 
"anchor ligand" otLys426. Previous analysis identified 
two distinct subfamilies of NifD proteins (indicated as 
A and B in Figure 3) characterised by distinctive 
sequences surrounding their FeMoco ligands at 
aCys275 and aHis442 [23]. Group C represent a third 
subfamily, containing Gin at position 276, Asp at position 
440 and lacking a residue corresponding to the aromatic 
amino acid found at position 444 in the A and B subfa- 
milies (Figure 3). Sequences in the C group are distinct 
from the alternative nitrogenase VnfD and AnfD subunits, 
which contain a conserved Ala at position 276, and a His 
residue replacing an acidic amino acid at position 445 
(indicated as Group V in Figure 3). 

The division of NifDK into three primary lineages, dis- 
tinct from AnfD/VnfD/AnfK/VnfK is supported by 
phylogenetic analysis ([24] and Additional file 3: Figure SI). 
The existence of two lineages within conventional NifDK 
proteins has been shown to correlate with the domain 
structure of NifB in Bacterial and Archaeal proteins [25]. 
The third lineage (denoted as C in Additional file 3: 
Figure SI), entirely comprised of representatives of the 
Archaea and Firmicutes, appears to correlate with the ab- 
sence of NifN and the sequence environment of the co- 
factor ligands in NifD. Notably the NifDK homologs in this 
lineage are all derived from thermophiles with the exception 
of Methanococcus aeolicus Nankai-3, which possesses both 
NifE and NifN. Two other NifDK sequences listed in 
the C group (Additional file 2: Table S3) are derived 
from the diazotrophic methanogens, Methanobacterium 
thermoautotrophicum Delta H, and Methanococcus 
maripaludis S2, which also encode nifE and nifN, The 
latter two NifDK proteins belong to a distinct group (la- 
belled M in Additional file 3: Figure SI) that is considered 
to have emerged before all other nitrogenase proteins [24] . 
Thermophilic Roseiflexus species that lack both NifE and 
NifN also belong to a separate phylogenetic group 
(labelled R in Additional file 3: Figure SI). In conclusion, 
there is evidence for nitrogen fixation in species lacking 
nifN, but this appears to be associated with a thermophilic 
lifestyle and the presence of a phylogenetically distinct 
form of nitrogenase. Although this represents a clear ex- 
ception to the minimal gene set, it appears to be a special 
case connected with the need to fix nitrogen in extreme 
environments. 

Nitrogenase-like sequences 

During our search for nitrogenases we encountered a 
large number of proteins that appeared to be distantly 
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Figure 3 Alignment of residues flanking conserved FeMoco ligands in NifD/VnfD/AnfD proteins. Alignment of residues flanking the 
conserved co-factor ligands, Cys275 and His442, in the alpha subunit of Mo-dependent and alternative nitrogenases. (The sequence numbering 
refers to A vinelandii NifD, Avin_01390.) Protein groups labeled A and B correspond to subfamilies 2 and 1 respectively, previously identified by 
Kechris et al. [23]. Group C represent the additional sub family described in the text. Group V corresponds to AnfD and VnfD sequences. 



related to the alpha and beta subunits of nitrogenase, but 
nevertheless belong to the Pfam nitrogenase component 
1 type oxidoreductase family (PF00148). This Pfam fam- 
ily currently contains 2561 sequences, although a large 
proportion of these show similarity to the B and N subu- 
nits of the light-independent chlorophyllide reductase 
(DPOR), which is structurally related to Mo-Fe protein 
of nitrogenase. This enzyme does not contain a hetero- 
metal cluster analogous to FeMoco within its active site, 
and the co-ordination of the [4Fe4S] "NB" cluster within 
DPOR is different to that of the [8Fe7S] P cluster in nitro- 
genase [26]. After removal of DPOR-related sequences 
from our analysis by running a BLAST search against 
ChlB, BchB, ChlN and BchN, we observed that NifDK 
paralogs are represented in both diazotrophic and non- 
diazotrophic strains. Phylogenetic analysis of the BLAST- 
filtered subset revealed distinct groupings that are clearly 
divergent from conventional nitrogenase (Figure 4). These 
outgroups are also distinct from the DPOR enzymes, 
which form a separate clade (not shown in Figure 4). The 
existence of an outgroup of nitrogenase homologs (termed 



Group IV) has been noted previously [27], but the current 
availability of genome sequences has enabled more exten- 
sive analysis. It is highly unlikely that any of these nitro- 
genase-like proteins are competent to reduce dinitrogen as 
they lack ligands required to co-ordinate Fe-Moco. 

Representatives of these non-conventional enzymes 
cluster in distinct clades relative to the conventional 
NifDKEN, Vnf/AnfDK and the C-group DK proteins, 
which are coloured dark blue in Figure 4. The genes en- 
coding these non-conventional proteins are adjacent in 
genomes and have the potential to encode the alpha and 
beta subunits of nitrogenase-like enzymes. The lineages 
coloured either green or yellow in Figure 4 comprise 
groups of NifE or NifN related proteins that each contain 
the three conserved Cys residues involved in liganding 
the P cluster. The NifE-related subunits of partners 
coloured in green possess the FeMoco-ligand Cys275, 
but lack the highly conserved co-factor ligand, His 442. 
Those coloured in yellow lack both FeMoco ligands. It is 
possible that these proteins ligand an [4Fe-4S] cluster in 
a similar location to the P cluster in nitrogenase that 
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Figure 4 Maximum-likelihood phylogenetic tree of conventional nitrogenases and nitrogenase-like sequences. The tree is represented by 
a core set of 73 sequences, selected from a larger tree of 472 sequences. Shimodaira-Hasegawa local support values were >0.6 except for those 
nodes marked with a red star. The clade coloring reflects sequences that are co-located in genomes and likely to correspond to the alpha and 
beta subunits of nitrogenase, with the exception of those shown in light gray, which are single subunit enzymes (NfID). Dark blue clades are 
conventional nitrogenases, labeled as NifD/E and NifK/N respectively. Clades colored in light-green are NifD/E and NifK/N-like sequences in which 
the FeMoco ligand Cys 275 in the alpha component, is either present (dark green nodes) or absent (yellow nodes). In all other cases known 
FeMoco ligands are absent. The number of conserved Cys residues in each subunit that correspond to P cluster ligands in conventional 
nitrogenases are indicated for each clade. 



delivers electrons to the active site. By analogy to NifEN, 
these enzymes may be able to reduce substrates with a 
limited number of electrons such as acetylene and azide 
[28]. These orthologs are found in diverse organisms, in- 
cluding the Proteobacteria, Archaea, Firmicutes and 
Fibrobacteres. Some organisms have an unusually large 
number of nitrogenase-like proteins of this class. For 
example, Syntrophobotulus glycolicus DSM 8271 contains 
nine protein pairs related to the alpha and beta subunits 
of nitrogenase. In two cases, these are organised as four 
linked genes (Sgly_0993, Sgly_0994, Sgly_0995 Sgly_0996 
and Sgly_2775, Sgly_2776, Sgly_2777 and Sgly_2778) po- 
tentially located in operons, suggesting that some of 
these gene pairs may provide scaffolding functions for 
co-factor assembly into the structural subunits, analo- 
gous to the nifDKEN gene clusters encoding conven- 
tional nitrogenase. 

More diverse representatives of the nitrogenase-like 
sequences are found in the Archaea and Firmicutes. These 
proteins lack FeMoco ligands and contain a variable 



number of conserved cysteine residues that may ligand a 
[Fe-S] cluster. For example Clostridium botulinum strains 
and Alkaliphilus oremlandii encode NifEN-like sequences 
(coloured light blue in Figure 4) that are located down- 
stream of genes encoding NifH and a potential ATPase 
component of the ABC transporter family. Their NifE- 
related components (CLM 0808 and Clos_0313) contain 
the three conserved P cluster ligands, but conserved Cys 
residues are not present in the NifN-like components 
(CLM_0809 and Clos_0314). In contrast, Methanocorpus- 
culum labreanum Z and Desulfitobacterium hafniense 
DCB-2 encode proteins with two conserved Cys residues 
(corresponding to ocC88/ctC62 and aC154/c«C124) in the 
NifD/E-related components (Mlab_1040 and Dhaf_1539) 
and only a single conserved Cys residue (corresponding to 
fiC95/CC44) in the NifK/N related subunits (Mlab_1039 
and Dhaf_1540). Representative species from the Human 
Microbiome project, including Coprococcus catus GD/7 
and Dorea longicatena DSM 13814, also appear in these 
clades (coloured red in Figure 4) and possess nitrogenase- 
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like sequences with a similar arrangement of conserved 
cysteines. These organisms encode two closely linked 
copies of NifHEN-like sequences in their genomes. It is 
possible that a residue other than cysteine serves to co- 
ordinate an [Fe-S] cluster in representatives of these 
clades, as observed in the case of DPOR, which utilises 
an aspartate residue as a cluster ligand [26]. 

A variation in the arrangement of the subunits in these 
nitrogenase-like sequences is observed in some represen- 
tatives of the Archaea, Firmicutes and Deltaproteobac- 
teria, whereby nifH and nifE— like genes are fused to 
form a single open reading frame that is followed by a 
nifN-like gene (data not shown). In contrast, several 
representatives of the Archaea possess only a single gene 
encoding a homolog of the alpha and beta chains of nitro- 
genase (e.g. Metvu_0736, Mpal_0679 and Mbur_1037) 
(coloured grey in Figure 4). These form part of the out- 
group identified by Raymond et al. [27] and are designated 
as NflD. These single subunit enzymes contain conserved 
Cys residues (corresponding to ocC88/aC62 and aC154/ 
aC124 in NifD/E) and are frequently annotated as putative 
methanogenesis marker 13 metalloproteins, which are 
thought to function in methanogenesis. 

Discussion 

Biological nitrogen fixation is thought to be one of the 
most ancient enzyme-catalyzed reactions [27]. The elab- 
orate architecture of its catalyst, which supports a com- 
plex reaction mechanism for dinitrogen reduction, has 
long been the subject of interest, not only from the view- 
point of evolutionary perspective and system complexity, 
but also as a fundamental biological process that can be 
exploited to develop new strategies for agricultural soil 
fertilization. The unpredictable occurrence of this meta- 
bolic trait across taxonomic groups, combined with the 
challenge of experimental detection of nitrogen fixation, 
makes it difficult to obtain a comprehensive census of 
prokaryotes with the capacity for diazotrophy. 

The universal presence of gene sequences coding for 
the nitrogenase catalytic components in diazotrophs 
{nifH and ttifD) is commonly used as a search tool in 
many phylogenetic studies. However, when using a 
single-gene survey in the database of microbial sequenced 
genomes, we detected orphan false-positive hits in several 
non-diazotrophic genomes. For example, the Methanobre- 
vibacter ruminantium Ml and Methanocaldococcus 
fervens AG86 genomes include only a sequence similar to 
NifH, while the Methanosphaera stadtmanae DSM 3091 
genome contains only a NifD-like sequence. In this case 
orphan nifD-like sequences may be evolutionary relics of 
divergent enzymes in which the NifD/E component does 
not contain conserved FeMoco ligands (see below). Thus 
genome analysis of environmental samples based purely 
on BLAST hits to NifH or NifD may lead to false 



indications of diazotrophy. To eliminate hits from orphan 
sequences our initial approach was to search in silico for 
the co-occurrence of NifH and NifD and then subse- 
quently filter these hits for the occurrence of other nitro- 
gen fixation protein sequences. 

Many previous studies have focussed on NifH and 
NifD sequences as markers for the phylogenetic distribu- 
tion of diazotrophs. However, BLAST searches at rela- 
tively low threshold identified nitrogenase-like sequences 
lacking FeMo-co ligands (Figure 4). 

False positives can therefore be obtained if only NifH 
and NifD are used in the search criteria. Extending the 
gene set to NifHDK or even to NifHDKB can also give 
rise to false positives, because sequences similar to the a 
and fi subunits of nitrogenase can be associated with 
NifH-like and NifB-like genes (Additional file 4: Figure S2). 
The strict requirement of a separate set of proteins involved 
in the assembly and synthesis of the active site cofactor, 
FeMoco, provides strong indication that the presence of 
nifH and nifD coding sequences alone does not provide 
enough evidence for diazotrophy. Therefore, our rationale 
was first to determine the inventory of nif genes that were 
always present in known-diazotrophic species. Literature 
searches combined with BLAST analyses led to the proposal 
that nitrogen fixation requires at least 6 gene products 
(Figure 1). Using this criterion, we found 67 species that we 
hypothesize have the metabolic capacity for nitrogen 
fixation. Our computational assignments provide a good 
indication that these species are potential diazotrophs 
and give direction to experimentalists to validate these 
predictions. 

Our in silico assignments predict that nearly 15% of 
prokaryotic species with sequenced genomes are either 
known or potential diazotrophs, a fraction much larger 
than commonly accepted. The biased distribution of 
sequenced genomes in relation to taxonomic groups 
probably undermines a robust evaluation of the taxo- 
nomic diversity of nitrogen fixation in nature. For ex- 
ample, the phylum Proteobacteria has 409 genomes from 
distinct species, while Thermomicrobia is represented by 
only one. Efforts towards detailed functional assignments 
of biochemical pathways were also compatible with our 
findings. The SEED database [29] lists the occurrence of 
20 nif genes in 45 unique species, and in all cases the 
minimum gene set is present. Almost all of these species 
are included in this study, the only exception being Mag- 
netospirillum gryphiswaldense, which was not in the 
NCBI database of completed sequenced genomes at the 
time this study was completed. It is probable that nitro- 
gen fixation also occurs in many other diverse species in 
which phyla are underrepresented in current databases. 
Therefore, applying the minimum gene set to newly 
sequenced genomes as they become available can lead to 
the identification of many other diazotrophs and further 
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expand the diversity of diazotrophs in terms of taxo- 
nomic distribution of this metabolic trait. 

Our study revealed a set of species for which our cri- 
teria for in silico prediction of nitrogen fixation were not 
satisfied, as they lack NifEN but nevertheless retain the 
nitrogenase structural genes together with nifB and nijV. 
Paradoxically recent phylogenetic analysis suggests that 
NifDK homologs present in strains lacking NifN, such as 
Caldicellulosiruptor saccharolyticus, Candidatus Desul- 
forudis audaxviator and Methanocaldococcus sp. FS406- 
22, emerged after the ancestral Mo enzymes found in 
hydrogenotrophic methanogens such as M. maripaludis, 
which have a complete FeMoco assembly pathway repre- 
sented by early branching lineages of NifE and NifN 
[24,25]. Nevertheless, the uncharacterised nitrogenases 
belonging to the C group appear to have evolved prior to 
the emergence of most NifDK homologs in both Archaea 
and Bacteria. Our studies indicate that although the cata- 
lytic components contain structural motifs competent to 
coordinate FeMoco, these proteins have a distinct envir- 
onment surrounding their co-factor ligands, which may 
confer unique maturation or catalytic properties. The 
presence of diazotrophic species within this group sug- 
gests that these nitrogenases may have distinct character- 
istics that permit a more parsimonious mechanism for 
FeMoco assembly. Without exception, organisms in the 
C-group that lack either NifN or NifEN are thermophiles 
inhabiting diverse environmental niches. Biochemical 
studies that mimic the absence of NifEN demonstrate 
that a NifDK enzyme containing NifB-co rather than 
FeMoco, exhibits hydrogen evolution and retains some 
ability to reduce acetylene, but not dinitrogen. Addition 
of molybdenum and homocitrate to the NifB-co contain- 
ing enzyme did not influence substrate reduction [30]. 
Potentially, however, thermal adaptation might permit 
the assembly of FeMoco on a modified scaffold or per- 
haps on the NifDK subunits themselves. Further charac- 
terisation of nitrogen fixation and the properties of 
nitrogenase in these thermophilic organisms will be 
required to establish if FeMoco can indeed by assembled 
via an alternative route. 

Our studies have highlighted a number of nitrogenase- 
like homologs belonging to oxidoreductase/nitrogenase 
component 1 family, which may have different metabolic 
functions compared to the well-characterised canonical 
representatives, nitrogenase and protochlorophyllide reduc- 
tase. Structural studies reveal that the fold of these two 
enzymes is remarkably similar, with equivalent positioning 
of the [Fe-S] clusters enabling a similar mechanism of ATP- 
driven electron transfer from the reductase protein, to the 
catalytic component. Diversity of substrate reduction is pro- 
vided by the presence of a cleft in the catalytic component 
that can either accommodate a large cofactor (FeMoco) or 
a large substrate (protochlorophylide). Although none of 



the alpha subunit related sequences we have analysed con- 
tain the FeMoco ligand His442, it is not possible to distin- 
guish whether the function of these sequences is likely to 
relate to catalysis (i.e. NifDK-like) or to biosynthesis 
(i.e. NifEN-like). Biochemical and structural studies of 
NifEN reveal its functional diversity, since it can catalyse 
cluster conversion, molybdenum incorporation into the 
cofactor in association with NifH, and potentially the in- 
corporation of homocitrate into FeMoco [9] . Although the 
primary role of NifEN is to provide the machinery for 
FeMoco biosynthesis, it has also been shown to catalyse 
reduction of some nitrogenase substrates, albeit with rela- 
tively low efficiency [13]. 

Nitrogenase-like sequences could potentially perform 
analogous roles in association with a NifH-like compo- 
nent. The genomic organisation of these proteins may 
provide some clues to their possible metabolic functions 
(Additional file 4: Figure S2). We note that sequences 
possessing the equivalent of Cys275 in the alpha subunit 
are commonly associated with O-acetyl homoserine sulf- 
hydrolase or cysteine synthase, suggesting a potential in- 
volvement in sulphur metabolism (e.g. Rhodospirillum 
rubrum ATCC 11170, Clostridium beijerinckii NCIMB 
8052, Geobacter sp. FRC-32, Additional file 4: Figure S2). In 
other cases, nitrogenase-like sequences are co-located with 
ABC transporter systems (e.g. Clostridium cellulovorans 
743B, Methanocorpusculum labreanum Z, Clostridium 
botulinum A2 Kyoto-F). Possibly this might provide a 
mechanism for coupling metal transport to the assembly of 
a metal cofactor. In Coprococcus catus GD/7 and other 
representatives of the Firmicutes, NifHEN-like proteins are 
associated with hydrogenase maturation proteins and may 
possibly play a role in the assembly of the active site 
metallocluster. The NifD proteins present in methanogenic 
Archaea have been proposed to function in coenzyme F430 
biosynthesis, and NflD has been shown to co-purify with a 
NifH-like gene, NflH [31]. In some cases we observe that 
NflD homologs are adjacent to NflH and a gene involved in 
a late step in cobalamin biosynthesis, which encodes cobyri- 
nic acid a,c-diamide synthase (Additional file 4: Figure S2). 
This may imply that these proteins function in cobalamin 
reduction. 

The NflD single subunit enzymes appear to be the 
early ancestors of both the bacteriochlorophyll biosyn- 
thesis proteins (BchN and BchB) and the nitrogenases 
(Nif/Vnf/AnfDK) [24,27,31]. Recent evolutionary studies 
suggest that nitrogen fixation originated after the emer- 
gence of bacteriochlorophyll biosynthesis [25] and conse- 
quently spread to diverse microbial lineages via lateral 
gene transfer [24,27]. Potentially, the additional NifDK- 
like sequences that we have identified may be representa- 
tive of ancestors that arose after the duplication event 
that led to the emergence of the alpha and beta subunits 
of nitrogenase and evolved to perform various metabolic 
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functions. It is important to note that thus far we have 
only identified nitrogenase-like sequences in obligate or 
facultative anaerobes, consistent with the view that nitro- 
genase evolved in anaerobic methanogens and Firmicutes 
[25]. As noted above these early forms may not have 
functioned as catalysts, but might have had roles in 
metallocluster biosynthesis. Although current informa- 
tion on the role of these nitrogenase-like sequences is 
sparse, future biochemical and structural studies on this 
hitherto unrecognised group of proteins are likely to 
provide a rich source of information concerning the 
evolution and catalytic diversity of these nitrogenase 
homologs. 

Conclusions 

This work led to the identification of 67 potential diazo- 
trophic species included in twelve taxonomic phyla, indi- 
cating that this metabolic trait is more widespread than 
formerly predicted. The identification of a minimum 
gene set required for nitrogen fixation provides a more 
robust method for the in silico prediction of this bio- 
chemical pathway. The occurrence of m/-orphan 
sequences or incomplete gene sets in several species 
questions single-gene approaches used in phylogenetic 
studies of nitrogen fixation. Furthermore our analysis 
highlights the presence of nitrogenase-like sequences 
with potential to catalyze as-yet unidentified functions. 

Methods 

Survey of nitrogen fixing genes in prokaryotic genomes 

Nitrogen fixing genes present in species with completely 
sequences genomes were identified through the protein 
database of microbial genomes at the National Center 
for Biotechnology Information up to July 17 th 2011. Only 
one representative of species containing more than one 
sequenced genome was manually selected resulting in 999 
unique Bacterial species and 93 unique Archaeal species. 
BLAST [32] searches used as queries the A. vinelandii 
nitrogen fixing protein sequences: NifH (Avin_01380), NifD 
(Avin_01390), NifK (Avin_01400), NifE (Avin_01450), NifN 
(Avin_01460), NifU (Avin_01620), NifS (Avin_01630), NifV 
(Avin_01640), NifB (A\dn_51010), NifQ (Avin_51040), AnfG 
(Avin_48980), and VnfG (Avin_02600). Initially hits were 
selected based on a relatively weak threshold (> 20%amino 
acid identity over the query length); using the minimum 
gene set criterion, hits to anf/vnfG, and presence of synteny 
the initial list was refined, yielding the protein sequences 
listed in Additional file 2: Table S2, Additional file 2: Table 
S3, Additional file 2: Table S4. 

Selection and phylogenetic analysis of nitrogenase-like 
sequences 

An initial list of 75 NifD/E and NifK/N-like sequences 
belonging to the PFAM family PF00148 were selected 



manually from the IMG database [33] (http://img.jgi.doe. 
gov) and then used as queries in a BLAST [32] search 
against the NCBI NR protein database with an e-value 
cut-off of 1(T 20 . This returned 1117 unique genelDs, 
which were then filtered against known NifD/E and 
NifK/N sequences (Additional file 2: Table S3) to remove 
hits to conventional nitrogenase. The remaining 900 
unique gene IDs were further filtered with a BLAST 
search against ChlB (accession GenBank:AAT28195.1), 
BchB (SwissProt:Q3APL0.1), ChlN (GenBank:AAP99591.1) 
and BchN (SwissProt:Q3APK9.1) to remove homologs of 
protochlorophylide reductase. Fused protein sequences 
(NifHD/E) were also filtered out and were not subject to 
further phylogenetic analysis. Another filtering was done 
with a preliminary tree built using FastTree 2.1 [34] to 
identify very similar sequences; only one member of each 
set of similar sequences was kept. The final compilation 
contained 472 unique gene IDs. 

Manual inspection of the 472-sequence tree yielded a 
"core" list of 73 representative sequences. These 73 
sequences were then aligned with ClustalW version 2.1 
[35] with the Gonnet 250 protein matrix and default 
pairwise alignment options. A phylogenetic tree was built 
with FastTree 2.1 [34] using the WAG + gamma20 likeli- 
hood model; the result is shown in Figure 4. 

Additional files 



Additional file 1: Table SI. Reference table of known diazotrophs 
[36-109]. 

Additional file 2: Table S2. Nitrogen fixation genes (locus tags) of 
known diazotrophs. Table S3. Nitrogen fixation genes (locus tags) of 
potential diazotrophs. Table S4. Nitrogen fixation genes (locus tags) of 
Group-C species. 

Additional file 3: Figure SI. Neighbor joining phylogenetic tree of the 
Nif/Vnf/AnfD and K seguences derived from the species shown in 
Figure 3. 

Additional file 4: Figure S2. Gene neighborhoods of selected 
nitrogenase-like proteins. 
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